منابع مشابه
The Factored Policy Gradient planner (IPC-06 Version)
We present the Factored Policy Gradient (FPG) planner: a probabilistic temporal planner designed to scale to large planning domains by applying two significant approximations. Firstly, we use a “direct” policy search in the sense that we attempt to directly optimise a parameterised plan using gradient ascent. Secondly, the policy is factored into a per action mapping from a partial observation ...
متن کاملFF + FPG: Guiding a Policy-Gradient Planner
The Factored Policy-Gradient planner (FPG) (Buffet & Aberdeen 2006) was a successful competitor in the probabilistic track of the 2006 International Planning Competition (IPC). FPG is innovative because it scales to large planning domains through the use of Reinforcement Learning. It essentially performs a stochastic local search in policy space. FPG’s weakness is potentially long learning time...
متن کاملPolicy Iteration for Factored MDPs
Many large MDPs can be represented compactly using a dynamic Bayesian network. Although the structure of the value function does not re tain the structure of the process, recent work has suggested that value functions in factored MDPs can often be approximated well using a factored value function: a linear combination of restr icted basis functions, each of which refers only to a small subset ...
متن کاملFactored Contextual Policy Search with Bayesian Optimization
Scarce data is a major challenge to scaling robot learning to truly complex tasks, as we need to generalize locally learned policies over different "contexts". Bayesian optimization approaches to contextual policy search (CPS) offer data-efficient policy learning that generalize over a context space. We propose to improve dataefficiency by factoring typically considered contexts into two compon...
متن کاملOnline Symbolic Gradient-Based Optimization for Factored Action MDPs
This paper investigates online stochastic planning for problems with large factored state and action spaces. We introduce a novel algorithm that builds a symbolic representation capturing an approximation of the action-value Q-function in terms of action variables, and then performs gradient based search to select an action for the current state. The algorithm can be seen as a symbolic extensio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Artificial Intelligence
سال: 2009
ISSN: 0004-3702
DOI: 10.1016/j.artint.2008.11.008